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PACS 87 . 10 . -e - General theory of biological physics 
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Abstract. - It is a long-standing question in origin-of-life research whether the information con- 
tent of replicating molecules can be maintained in the presence of replication errors. Extend- 
ing standard quasispecies models of non-enzymatic replication, we analyze highly specific enzy- 
matic self-replication mediated through an otherwise neutral recognition region, which leads to 
frequency-dependent replication rates. We find a significant reduction of the maximally tolera- 
ble error rate, because the replication rate of the fittest molecules decreases with the fraction of 
functional enzymes. Our analysis is extended to hypercyclic couplings as an example for catalytic 
networks. 
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Introduction. — According to the RNA world hy- 
pothesis [1], prcbiotic biochemical life is thought to have 
emerged through four steps: starting from the primordial 
non-enzymatic synthesis of nucleotides and their subse- 
quent non-enzymatic polymerization into random RNA, 
which in a third step would non-enzymatically replicate, 
natural selection would finally produce a set of func- 
tional RNA enzymes (ribozymes), establishing exponen- 
tial growth and initiating RNA evolution. Despite consid- 
erable experimental progress [2,3], as of today no truely 
sclf-rcplicating system has been evolved according to this 
hypothetic schedule. To assess its intrinsic plausibility, 
theory has mainly focused on the third step, usually based 
on the Eigen model [4] for prebiotic evolution: here, auto- 
catalytic self-replication of L-nucleotide sequences pro- 
ceeds non-enzymatically via stepwise template-directed 
polymerization, with a non-negligible error probability jj, 
per single nucleotide. Assuming that one specific "master" 
template replicates with the highest rate a > 1, while all 
other sequences have unit replication rate, it is found that 
faithful replication of the master is possible only for error 
probabilities smaller than a critical value fi c ss lna/L. In 
this regime, the population in sequence space is concen- 
trated about the master in a rather broad distribution, 
giving rise to the notion of a "quasispecies" . Larger val- 
ues /i > jU c lead to a delocalized state with completely 
random sequences in the population. Many aspects of the 
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Eigen model depend to a large extent on the chosen fitness 
landscape, which assigns replication rates to genotypes. In 
the case of RNA, it displays a considerable degree of neu- 
trality, because the mapping of sequences to secondary 
structures is decidedly many-to-one [5], Still, although 
not universal, the existence of a critical mutation rate \x c 
is a comparatively robust phenomenon [6,7]. It has been 
termed "error catastrophe" [8], because it puts possibly 
irreconcilable simultaneous constraints on maximally tol- 
erable error probability and minimal functional sequence 
length. 

Lacking actual observations of freely self-replicating 
RNA and hence reliable estimates for replication rates, 
these theoretical limitations of non-enzymatic RNA repli- 
cation are not yet reasonably quantitative. However, bio- 
chemical issues [9] raise severe doubts about its plausibil- 
ity as well. Although ribozymes have been discovered that 
catalyze most of the necessary reaction steps [2,3,10,11], it 
remains questionable how a ribozymc should literally copy 
itself [10,12]. Enzymatic replication seems the far more 
likely scenario, in the sense that a ribozyme copies other 
molecules. Presumably and most effectively, it would copy 
only those molecules that are exact replicas of itself, not 
only because known ribozymes act very substrate-specific, 
but also because unspecific recognition does not give a se- 
lective advantage to the replication enzymes themselves; 
this would require compartmentalization in vesicles to 
keep closely related molecules together [12]. Further, it 
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Fig. 1: Schematic illustration of the model. Left: Molecules 
with correct structure can replicate non-enzymatically with 
rate a > 1. These molecules can also replicate enzymatically 
with rate 7, if they bind specifically to an identical partner 
within an otherwise selectively neutral recognition region of 
XL sites (dots). Misfolding mutant molecules replicate with 
unit rate. This model is formulated in terms of sequences 
rather than structures, as shown on the right panel: we distin- 
guish correct "structural" nucleotides (*), matching sites in the 
recognition region (/) and unmatching or random nucleotides 
(x). 

has recently been suggested that the spontaneous emer- 
gence of RNA polymerases even without previous non- 
enzymatic replication could be promoted by a significant 
increase of functional complexity in a pool of random RNA 
due to the likely appearance of ligase activity [13]. 

In order to comparatively analyze non-enzymatic and 
enzymatic replication, their competition and their re- 
spective tolerance against mutations theoretically and by 
means of stochastic simulations, we employ a simplified 
quasispecies model, where sequences replicate both non- 
enzymatically and enzymatically, the latter with high 
specificity. We find a coexistence regime of these two repli- 
cation modes, and an escalation of the error catastrophe 
in the enzymatic case: because the replication rate of the 
fittest molecules decreases with the fraction of functional 
enzymes, the maximally tolerable mutation rate is signif- 
icantly reduced. To make contact to models of modular 
evolution and catalytic networks, where complex function 
is assumed to emerge through independent selection of 
small functional motives, thereby circumventing the error 
catastrophe [14,15], we then extend our analysis to the 
case of hypercycles [8, 16, 17]. 

Model. — Motivated by the observation that catalytic 
and recognition regions are often clearly separated in ri- 
bozymes like the RNA component of RNaseP [11], we as- 
sume that the specific recognition mediating enzymatic 
replication involves only a small fraction A of otherwise 
selectively neutral sites. This means that the majority of 
sites forms the proper secondary structure of the molecule 
and builds its active center, which catalyzes the polymer- 
ization reactions. Although secondary structure folding 
algorithms provide an improved genotype-fitness mapping 



through an excellent approximation to RNA phenotypes, 
our model is formulated in terms of sequences instead of 
structures to allow for analytical treatment. We hence 
distinguish between "structural sites" and a "recognition 
region" on the sequence level (see Fig.[T]for a schematic il- 
lustration of our model). For the former, we use a sharply- 
peaked fitness landscape: a master sequence S* has the 
highest non-enzymatic replication rate a > 1, while all 
other sequences replicate with unit rate defining the time 
scale. We ignore possibly neutral sites in the structural 
region, because on our level of approximations this merely 
renormalizes their total number, or, equivalently, the mu- 
tation probability (see below). However, we do account 
for mutations in the recognition region, which do not af- 
fect non-enzymatic replication but the specificity of en- 
zymatic replication: idealizing "highly specific" , we re- 
quire the recognition regions of enzyme and substrate to 
be identical for enzymatic replication to take place. Hence, 
ribozymes replicate only exact copies of themselves, with 
7 the associated rate constant. Note that we do not make 
any restrictions on the specific sequence of the recognition 
region: any molecule with the correct sequence for the 
structural sites can replicate enzymatically if it recognizes 
a suitable enzyme. 

In the following, we formalize this model in the frame- 
work of quasispecies theory [4], where molecules are rep- 
resented by sequences Si = (crj 8 ' . . . a^' ) of L binary 
nucleotides Oj € {0, 1}. Their concentrations Xi evolve in 
the L-dimcnsional hypercube according to the determin- 
istic rate equations 

Xi = ^2 m ik r k x k —Xif, (1) 
k 

where r k is the replication rate of S k , ram = /j, dik (l — 
^L~d ik j g ^ e mutation probability between sequences 
Si and S k with Hamming distance d ik , and [i is the 
singlc-nucleotide mutation probability. The second term 
in Eq. |T|) involves the mean replication rate f = r k x k 
and ensures the normalization x k = 1. According to 
the above defined model, the replication rates read 

r _ \ a + l x k, if <Sfc|struc = S^lstruc, ^) 

1 1, otherwise. 

In Eq. (|2|), 5k| 8 truc denotes the restriction of the sequence 
Sk to the structural sites, and S*| s truc is the corresponding 
master sequence. While replication rates are usually taken 
as functions only of the genotype, with one single peak 
at the master sequence [4,6,7,18-20], our model leads to 
frequency-dependent selection, which has only rarely been 
analyzed because it leads to mathematically challenging 
replicator-mutator equations (see, e.g., Ref. [21]). 

Stochastic simulation. — For a realization of the 
full 2 L -dimensional system Eq. ([TJ in a finite population 
of N sequences, we employ the straightforward stochas- 
tic simulation algorithm used in Ref. [25]. At each time 
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Fig. 2: Exemplary run of the stochastic simulation in a popula- 
tion of N = 10 3 sequences for L = 32, a = 5, 7 = 10, A = 1/4, 
and /J, = 0.005. All sequences have been initialized with cor- 
rect structural region but random recognition region. Their 
concentration is shown in gray level as function of time and 
genotype in the recognition region (linearly arranged by read- 
ing bit strings as integer numbers). Spontaneous concentration 
fluctuations lead to the establishment of a quasispecies of en- 
zymatic replicators centered about one specific yet randomly 
chosen master sequence. Neighboring sequences are indicated 
by thin lines. 

t each sequence Sk, present in nj~ copies, has a proba- 
bility po.fc = nk/^2jTii(l +7"j) to be copied without mu- 
tations into the population at time t + 1, and a prob- 
ability Pmutjfc = mjkrknk/J2i n iO- + r i) to be selected 
and mutated into sequence Sj. The population is initial- 
ized uniformly at the master sequence, but with random 
recognition sequences. Because the sites of the recognition 
sequence are effectively neutral, a state with all possible 
recognition sequences present in equal concentration is sta- 
ble for large sequence length [21]. But if number fluctua- 
tions sufficiently increase the concentration of one particu- 
lar yet randomly chosen sequence, this conveys via Eq. @ 
a selective advantage, and its concentration will thus in- 
crease, up to the extent that mainly this sequence and its 
next mutational neighbors are present, in a quasispecies 
distribution very much like the one obtained in usual fit- 
ness landscapes. Fig.[2]shows an example of this outcome, 
which is somewhat reminiscent of a fixation event. While 
its detailed dependence on the specific formulation of the 
underlying stochastic process and the parameter values is 
left for future research, the localization itself turns out to 
be a robust phenomenon. In the following, we will there- 
fore without loss of generality assume that the recognition 
region of the most populated sequence is equal to the one 
of the master sequence S* . 

Results. — While analytic solutions to the 2 L - 
dimensional system Eq. ([1]) are hard to obtain, we can 
use the so-called "error-tail" approximation [19]: here, we 
introduce three different classes of molecules. In x e , we 
gather enzymatic replicators identical to the master se- 
quence, with a replication rate r c = a + 7X0. We use a 
second class x n for non-enzymatic replicators, with struc- 



tural sites identical to the master sequence but random 
recognition sequences. Their replication rate is r n = a: 
although they are capable of enzymatic replication, the 
fraction of suitable enzymes with the appropriate recog- 
nition region is negligible. Finally, 1 the 
error-tail of molecules with incorrect structural sites and 
unit replication rate. The main approximation of the 
error-tail approximation is to consider only those muta- 
tions that lead into a less-fitter class, with the probability 
not to have such a mutation abbreviated as "quality fac- 
tor" Q. This approximation is generally valid for large 
sequence length but may fail if peaks in the fitness land- 
scape are very dense [7]. The enzymatic replicators in x c 
have Q c = (1 — ( u) i = Q, because a single error in L nu- 
cleotides suffices to destroy either structural or recognition 
region. The non-enzymatic replicators in x n have a larger 
quality factor Q n = (1 — /i(l — Xj) L m Q 1_A > Q: because 
the presence of XL neutral sites in the recognition region 
reduces the effective mutation probability, these sequences 
are mutationally more robust [22-24]. Further, with prob- 
ability — Q mutations in x e will hit a site of the 
recognition region and thus contribute to x n . Hence, the 
dynamical system in the error-tail approximation is given 
by 

x c t c Qx g x c t 

where the mean replication rate reads f = (r e —l)x e +(r n — 
l)x n + 1. Solutions to the stationary state x c = x n = 
of Eq. §3§ for different mutation probabilities fj, are shown 
in Fig. [3J together with results from a stochastic simula- 
tion of the full system with the replication rates Eq. ([2]) in 
a population of N = 10 4 sequences, where we initialized 
the sequences uniformly at the master sequence to reduce 
noise resulting from the intrinsically stochastic "fixation" 
events shown in Fig. [21 and averaged the results over time 
after reaching a stationary state. Obviously, approximat- 
ing the deterministic rate equations with the simplified 
Eq. ([3]) gives an excellent description of the stochastic sys- 
tem. We can clearly distinguish three different regimes, 
separated by two error thresholds. 

For high mutation probability, the population is delo- 
calized over sequence space and only the error tail is sig- 
nificantly populated (x c = x n = 0). For smaller values 
of /z, we find a "non-enzymatic regime" , where sequences 
with correct structural region are present, but a stable 
recognition sequence cannot be maintained, such that en- 
zymatic replication is not possible. Explicitly, we find 
x c — and x n = (aQ 1 ~ x — l)/(a — 1). The two regimes 
exchange stability at Q = a -1 ^ 1- ^, corresponding to 
/i = /Lt C)I1 ~ lna/(L(l — A)). This is the familiar "pheno- 
typic error threshold" [23,24]: the presence of neutral sites 
renormalizes the effective mutation probability, equivalent 
to having a shorter sequence [4] . 

For smaller mutation probabilities /i < fi c>e the "enzy- 
matic regime" becomes stable. Here, the fraction x c of 
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Fig. 3: Comparison between simulation results for x e (circles) 
and x n (squares) in a population of N = 10 4 sequences and 
solutions to Eq. ([3} for L — 32, a — 5, 7 = 10 and A = 
1/4. The two error catastrophes occur at /x c , e ~ hiQtT 1 /!/ 
with Q c a solution of Eq. §4fy and /i c , n « lna/(L(l — A)). The 
inset shows the average Hamming distance (d) to the master 
sequence, which increases in two steps, the first one at /^ Cie 
discontinuous, the second one at ^i c , n continuous. 



enzymatic replicators is nonzero, but x n > as well, be- 
cause this class is fed from x e through mutations in the 
recognition region. Solving a third-order polynomial for 
x c and Xn, we find this regime is stable when the corre- 
sponding discriminand is positive, which yields a critical 
value Q — Q c from the condition 

4 [3 7 (1 + aQ c (Q- x - 2)) - ( 7 Q C + Qc X - a) 2 ] 3 
+ [9 7 (a - 7 Qc - Q- X )(l + aQ c (Q- x - 2)) 
+ 27a 7 (aQ c -l)(l-Q- A ) +2( 7 Q C + Q~ A - a) 3 ] 2 = 0. 



Asymptotic solutions are given by: 
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Note that the large-7-limit is Q c ~ 2/^/7, which implies 
for the corresponding critical value /j, Cj0 ~ hiQ ( T 1 /L w 
In7/(2L). This significant reduction by a factor of 2 can 
be phrased as "escalation of error catastrophe" : as the 
fraction a; c of enzymatically replicating sequences drops 
with higher mutation probability, their replication rate 
r c = a + ~/x c decreases as well, leading to an even stronger 
reduction in x c . Beyond the critical value ^ c ,e, the frac- 
tion of molecules with the correct recognition sequence 
becomes so small that their replication rate is not large 
enough for them to be maintained in the population at a 
macroscopic level. 

An important difference between the transitions at /i c , n 
and ^ c ,o can be observed not only in the fraction x e , but 
also in the width of the population distribution (measured 
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Fig. 4: Phase diagram of stability regimes of Eq. (|3]) in the 
ln7-^i-plane: the critical value [i c ,n (thick dashed line) sepa- 
rates the delocalized regime (above) and the non-enzymatic 
regime (below). Enzymatic replication is stable below ^ c ,o 
(thick solid line) and becomes mutationally more robust than 
non-enzymatic replication if 7 > 7* = 0(a 2 ) (vertical line). 



as average Hamming distance to the master sequence), 
shown in the inset of Fig. [3j while the derealization tran- 
sition at fi — fi c ^ n is continuous, the transition at /i Cj0 is dis- 
continuous. In the former case, this property depends also 
on the choice of observable [7], but in the latter case, the 
discontinuity results from bistability: together with the 
enzymatic regime, also the non-enzymatic regime or the 
delocalized phase may be stable, depending on whether fi 
is larger or smaller than /J, c>n , and if /1 > /i c . c the enzymatic 
regime vanishes. The phase diagram in Fig. [4] summarizes 
these various regimes. Note that jjL c>e = pb c>u at a critical 
value 



7* = a{a - 1) + 2A 1/2 oV2a(a - 1) In a + 0(A). (6) 

This result implies that very large rates 7 = 0(a 2 ) are re- 
quired if enzymatic replication is to be more error-tolerant 
than non-enzymatic replication [16]. Although this possi- 
bility is not contained in the approximate Eq. ([3]) , we find 
that in the bistability region fj. < mhi(/Lt c , c , Men) the en- 
zymatic regime is easily populated by selectively advanta- 
geous concentration fluctuations from the non-enzymatic 
regime by randomly choosing a "master" sequence for the 
recognition region as shown in Fig. [2] 

Extension to hypercyclic couplings. The real- 
ization that replication errors limit the maximum com- 
plexity of self-replicating molecules to a possibly paradox- 
ical extent has lead to theories of modular evolution, where 
complex functions emerge through catalytic interactions of 
smaller independently selected motifs [14,15], thereby also 
speeding up evolution by facilitating the search for com- 
plexity While arbitrarily complex interaction networks 
between different modules or molecular species are con- 
ceivable, the simplest case applicable to the above system 
with its two- molecule interactions is the hypercycle [8]. 
Here, n species are arranged in a circular directed graph, 
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where each species enzymatically catalyzes the replication 
of its next neighbor. This network gives rise to coexis- 
tence of all species, in a stable fixed point for n < 4 and 
via periodic orbits for larger n. In contrast to previous 
approaches accounting for replication errors in a hypercy- 
cle [16, 17], we consider distinct error tails for all species: 
each is present in an enzymatically active variant x e j with 
replication rate r c j = on + ^iX c _i+i together with its non- 
enzymatic error tail a; n ,i with replication rate r n ,j = on. 
In addition, there is the global error tail of misfolding mu- 
tants. For simplicity, we assume a symmetric setup with 
identical rate constants on = a and 7, = 7. This gives the 
rate equations 

Xc.i — ^c.iQXq i X c ;T 

1 A 1 A (O 

Xxx,i — Tn,iQ ^n.i ~t~ T*c.i(Q Q^j-^c.i X n ^T : 

where indices are taken modulo n and the mean fitness is 
now given by f = (a-1) Ej^M+^iJ+TDi x e>i x eti+1 +l. 
It is easy to see that this system reduces to Eq. ^ if we 
assume that x e ^ = x*/n and x n ^ = x^/n and replace 
7 ~~ * l/ n - Then, the "enzymatic" solution of Eq. ([3]) 
corresponds to the inner fixed point of the hypercycle, 
where all species are present in equal concentration. Repli- 
cation errors can be tolerated only if u < « CjC , where 
^c iC ~ ln(7/n)/(2L) for large 7 (see also Rcf. [16]). We 
emphasize that in contrast to Rcf. [14], where unspccific 
ligation was assumed to link functional motifs, in our 
model specific recognition between different species leads 
to frequency-dependent replication rates. This reduces the 
error threshold by roughly a factor of 2, which in a two- 
member hypercycle would cancel the putative complexity 
gain resulting from using two subunits of half the sequence 
length. 

Moreover, increasing the number of hypercycle members 
beyond n = 4 changes the stability of the central fixed 
point. Observing that the Jacobian matrix of Eq. JJ} is 
block-circulant [26] (every block is a 2 x 2-matrix for the 
two concentration variables x e> i and x n> i per species), its 
crucial eigenvalues with possibly non-negative real part 
are given by ^Qx*e lm7T ^ n , where m = 0, . . . , n — 1 and 2* 
is the enzymatic solution of Eq. ([3]) with 7 — ► 7/n. Hence, 
these eigenvalues are proportional to the n different nth 
roots of unity. In close correspondence to the error-free 
hypercycle (and in contrast to Ref. [16], where a stabil- 
ity region for n = 5 was found), the central fixed point 
loses stability for n > 4, giving rise to limit cycles with 
large concentration oscillations, which arc vulnerable to 
extinction via stochastic fluctuations. 

Note that the stable inner fixed point corresponding 
to the enzymatic regime implies coexistence of different 
species. However, in the non-enzymatic regime of Eq. 
the different error tails do compete against each other. 
As soon as the hypercycle breaks down, e.g., because the 
recognition sequence is lost due to stochastic fluctuations, 
one error tail will drive the others to extinction, due to 
the competitive exclusion principle encountered in usual 



quasispecies theory [4]. This makes the reverse process, 
i.e., a fluctuation that establishes a closed cycle, extremly 
unlikely. 

Conclusion. — In summary, we have analyzed a sim- 
ple quasispecies model for the non-enzymatic and enzy- 
matic replication of ribozymes, where specific recognition 
is mediated via otherwise neutral sites. We find that the 
frequency-dependent replication rates associated with spe- 
cific enzymatic replication lead to a discontinuous transi- 
tion at the error threshold due to bistability with a partly 
delocalized phase. Further, hypercyclic couplings enable 
coexistence of at most four different species and their re- 
spective error tails in a stable fixed point. 

* * * 

Financial support by the Deutsche Forschungsgcmcin- 
schaft through SFB TR12 is gratefully acknowledged. 

REFERENCES 

[1] Orgel L., Crit Rev Biochem Mol , 39 (2004) 99. 

[2] Johnston W., Unrau P., Lawrence M., Glasner M. 

and Bartel D., Science , 292 (2001) 1319. 
[3] Lincoln T. and Joyce G., Science , 323 (2009) 1229. 
[4] Eigen M., McCaskill J. and Schuster P., Adv. Chem. 

Phys. , 75 (1989) 149. 
[5] Huynen M., Stadler P. and Fontana W., Proc Natl 

Acad Sci USA , 93 (1996) 397. 
[6] Wiehe T., Genet Res Camb , 69 (1997) 127. 
[7] Jain K. and Krug J., Adaptation in simple and complex 

fitness landscapes arXiv:q-bio/0508008 (2005). 
[8] Eigen M. and Schuster P., Naturwissenschaften , 65 

(1978) 7. 

[9] Orgel L., Nature , 358 (1992) 203. 
[10] Joyce G. F., Angew. Chem. , 46 (2007) 6420. 
[11] Lilley D. M., Curr Opin Struc Biol , 15 (2005) 313. 
[12] Szostak J., Bartel D. and Luisi P. L., Nature , 409 
(2001) 387. 

[13] Briones C, Stich M. and Manrubia S., RNA , 15 
(2009) 743. 

[14] Manrubia S. C. and Briones C., RNA , 13 (2007) 97. 
[15] Takeuchi N. and Hogeweg P., Biol Direct , 3 (2008) 
11. 

[16] Campos P., Fontanari J. and Stadler P., Phys. Rev. 

E , 61 (2000) 2996. 
[17] Silvestre D. A. M. M. and Fontanari J. F., Journal 

of Theoretical Biology , 254 (2008) 804. 
[18] Schuster P. and Swetina J., Bull Math Biol , 50 (1988) 

635. 

[19] Nowak M. and Schuster P., J Theor. Biol , 137 (1989) 
375. 

[20] Saakian D. and Hu C, P Natl Acad Sci USA , 103 
(2006) 4935. 

[21] Stadler P., Schnabl W., Forst C. and Schuster P., 

Bull Math Biol , 57 (1995) 21. 
[22] Wilke C, Wang J., Ofria C, Lenski R. and Adami 

C, Nature , 412 (2001) 331. 



Obcrmaycr et al. 



[23] Takeuchi N., Poorthuis P. and Hogeweg P., BMC 

Evol Biol , 5 (2005) 9. 
[24] Kun A., Santos M. and Szathmary E., Nat Genet , 37 

(2005) 1008. 

[25] Wilke C, Ronnewinkel C. and Martinetz T., Phys 

Rep , 349 (2001) 395. 
[26] Tee G., Res. Lett. Math. Set , 8 (2005) 123. 



